Goto

Collaborating Authors

 tetrakis function


Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Jacot, Arthur

arXiv.org Machine Learning

Technically speaking, no statistical model can be strictly better than another, and each model can be optimal under certain assumptions. However, it has been argued that there is a statistical model that outmatches all others up to constants: finding the minimal Kolmogorov complexity function, i.e. the program with the minimal description length, that fits the data [42, 43]. Such an ideal statistical model would be at least as good as any model with short descriptions [20] which basically includes all models that have been described within a single scientific paper. But this perfect statistical model appears unattainable because we know that Kolmogorov complexity is undecidable in general, and even less optimizable. In this paper, we prove results that suggest that Residual Networks (ResNets) training is implementing a weaker version of this ideal model: greedily minimizing circuit complexity in the so-called Harder than Monte Carlo (HTMC) regime. These simplifications can be summarized as follows: Minimal Circuit Size Problem (MCSP): searching for an interpolating circuit with minimal minimal number of operations is "only" NP.

  Country:
  Genre: Research Report (0.63)